Development of patient-reported outcome for adult spinal deformity: validation study

Adult spinal deformity (ASD) is a complex condition that combines scoliosis, kyphosis, pain, and postoperative range of motion limitation. The lack of a scale that can successfully capture this complex condition is a clinical challenge. We aimed to develop a disease-specific scale for ASD. The study included 106 patients (mean age; 68 years, 89 women) with ASD. We selected 29 questions that could be useful in assessing ASD and asked the patients to answer them. The factor analysis found two factors: the main symptom and the collateral symptom. The main symptom consisted of 10 questions and assessed activity of daily living (ADL), pain, and appearance. The collateral symptom consisted of five questions to assess ADL due to range of motion limitation. Cronbach’s alpha was 0.90 and 0.84, respectively. The Spearman’s correlation coefficient between the change of main symptom and satisfaction was 0.48 (p < 0.001). The effect size of Cohen’s d for comparison between preoperative and postoperative scores was 1.09 in the main symptom and 0.65 in the collateral symptom. In conclusion, we have developed a validated disease-specific scale for ASD that can simultaneously evaluate the benefits and limitations of ASD surgery with enough responsiveness in clinical practice.


Patients
This study was a multicenter, self-report questionnaire survey conducted at two spine centers.In total, 106 patients were included: 97 patients who underwent long fusion surgery between 2007 and 2020 and nine patients who were undergoing conservative treatment and considering surgery for spinal deformity.The conservative patients had spinal deformities but preferred conservative treatment because their clinical symptoms were milder than those of the operative patients.A questionnaire consisting of 29 questions was mailed to these patients, and they were asked to complete and return it.Patients who had undergone surgery were asked to answer both preoperative and postoperative conditions.Conservatively treated patients were asked to answer questions about their current condition.A five-point satisfaction rating scale for surgery and Short-Form-8 (the physical component summary; PCS, and the mental component summary; MCS) were enclosed for criterion-related validation.
Of the 106 patients, eight did not receive the mailing due to a change of address.The 98 patients (89 surgical patients) who responded were included in the study (Fig. 2).Long fusion was defined as the fusion of five or

Selection of 29 questions
COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) aimed at improving the selection of PROM in research and clinical practice and some guidelines exist.We conducted this study in accordance with the COSMIN guidelines 10 .Content validity is the most important measurement property of PROM.It is the degree to which the content of an instrument is an adequate reflection of the construct to be measured.The criteria of content validity include the relevance, comprehensiveness, and comprehensibility of the PROM for the target population.We conducted a literature search to select questions relevant to ASD.We assumed that ADL, appearance, pain, mental health, and satisfaction would be the assessment items necessary to capture the disease concept of ASD 1,3,6,[11][12][13] .
From these categories, we extracted 114 items that were considered useful for assessing ASD (Fig. 3).Sexual life, although an important item, was not included because of the expected large number of non-responses 28 .To ensure the relevance of questions to ASD in content validity, eight surgeons with extensive experience in operating on patients with ASD gave these 114 items a score from 3 to 0 according to their level of importance.We used the total score as a reference and selected 29 question items after discussion among the senior surgeons (Table 2).We modified detailed wording partially modified as appropriate.To examine the results comprehensibility, the developed questionnaire was given to three patients and one nurse, who reviewed the items in terms of text, meaning, and ambiguity and who provided feedback.Responses were on a five-point scale 29 , with an additional free-text field.

Ethics statement
The study was conducted in accordance with the ethical standards of the Declaration of Helsinki.The study was approved by the local ethical review board (Osaka University Hospital Ethics Review Committee.No.11360).Written informed consent was obtained from each patient.www.nature.com/scientificreports/

Statistical analysis
The COSMIN guidelines introduce classical test theory and Rasch analysis for construct validation.We used classical test theory and factor analysis.Factor analysis was used to reduce and group the questions in order to create a valid, simple, and easy-to-use questionnaire.An exploratory factor analysis was performed using the maximum likelihood method on data from a total of 98 patients, including 89 postoperative responses and nine conservative cases.The number of factors was determined using the scree method.Because correlations between factors can be assumed, oblique rotation was performed using the Promax method.Finally, reliability was evaluated for content consistency using Cronbach's coefficient alpha.

Score calculation formula
Factor score coefficients obtained from factor analysis were used as a reference to correct the coefficients so that the scale's total score ranged from 0 to 100.Specifically, individual items were weighted so that the difference between the minimum and maximum factor scores was approximately 100 depending on the choice of response 14 .However, we provided greater weight to those questions that clinicians deemed important.For example, 0 represented a limited health status and 100 represented an excellent health status.

Comparison of scores and responsiveness
We compared the scores of the created scale, the PCS, and the MCS before and after surgery (paired t-test).Similarly, we compared the scale scores between the operated and conservative groups (unpaired t-test).We calculated Cohen's d effect size by taking the difference between two means and dividing it by the standard deviation of the data.Cohen's d effect size was used to evaluate the internal responsiveness of the scales.Next, we calculated Spearman's correlation coefficients between the five satisfaction levels and the amount of score change on each scale.The external responsiveness of the scales was evaluated using Spearman's correlation coefficients.An effect size of 0.2-0.49was considered small, an effect size of 0.5-0.79 was considered moderate, and an effect size of  www.nature.com/scientificreports/0.80 or greater was considered large 30 .A correlation coefficient of 0.2-0.39 was considered weak, a correlation coefficient of 0.4-0.69 was considered moderate, and a correlation coefficient of 0.70 or greater was considered strong.A p-value < 0.05 was considered statistically significant for two-tailed tests.SPSS Statistics (version 20; IBM, Armonk, NY, USA) was used for statistical analysis.

External validation
We collected new patients with ASD from another institution for external validation.We applied our ASD disease-specific scale for these patients and compared the results with the internal validation data.

Response of the patients
The results of the responses to each question are shown in Table 4, and the correlation coefficients are shown in Table 5.Seven patients had a free-text response of not performing Q23 heavy housework.Therefore, Q23 heavy housework was deemed inappropriate and excluded from the factor analysis.Regarding Q16 walking distance, four patients answered that they did not know the distance.Because there was a strong correlation between Q16 walking distance and Q17 walking time, we considered that Q17 walking time could be substituted for Q16 walking distance and excluded Q16.Factor analysis was conducted on the remaining 27 questions.

Factor analysis
The two-factor solution was adopted based on the decay status of the eigenvalues (scree criteria).The proportion of the total variance of the 27 items explained by the two factors before rotation was 47%.Each item was ordered by factor loadings (Table 6).The first factor was named the main symptom because many of the symptoms were related to the patient's primary complaints, such as the ability to do housework and walk, including Q25 dishwashing, Q21 laundry, Q20 shelving, and Q17 walking.The loadings for Q1 appearance, Q2 back pain, and Q29 anxiety were relatively low but were included because we considered these questions www.nature.com/scientificreports/essential.We selected Q19 ride, Q24 garbage disposal, and Q15 standing as the remaining questions, according to factor loadings.Because Q22 light housework was strongly correlated with Q25 dishwashing (r = 0.82) and Q21 laundry (r = 0.80) and was considered to refer to the same thing, we excluded Q22.A total of 10 question items (Q1 appearance, Q2 back pain, Q15 standing, Q17 walking, Q19 ride, Q20 shelving, Q21 laundry, Q24 garbage disposal, Q25 dishwashing, Q29 anxiety) were used for the main symptom factor.www.nature.com/scientificreports/ The second factor was named the collateral symptom because many items were related to postoperative limitation of movement, such as Q12 socks wearing and Q9 picking up.Because wearing Q11 pants and Q12 socks were highly correlated (r = 0.76), we excluded Q11 because Q12 socks could be substituted for Q11 pants.According to factor loadings, we selected five question items (Q7 standing up floors, Q8 toilet, Q9 picking up, Q10 washing, Q12 socks) as collateral symptom factors.

Internal consistency
The Cronbach's alpha coefficient was 0.90 for the main symptom and 0.84 for the collateral symptom.

Calculation of scores
The factor score coefficients were used as weighting coefficients for each question, rounding the factor score coefficients to whole numbers to distribute the total scale score was distributed from 0 to 100.Because Q1 appearance and Q2 back pain are particularly important items, we gave them the same coefficients as Q25 dishwashing, which had a higher factor score coefficient.The better symptoms were set to 100 and the worse symptoms were set to 0. The calculation formulas are shown below (Supplement File 1). (

Score change
The scores calculated based on the above formula are shown in Table 7. Comparing the operative and conservative groups, the main symptom of the operative group was 47 ± 21 preoperatively, while the conservative group was 63 ± 15.The operative group had significantly worse preoperative main symptoms than the conservative group (p = 0.029).However, the main symptom of the surgical group significantly improved to 70 ± 22 after surgery (p < 0.0001), exceeding those of the conservative group.As a result of the surgical improvement, there was no significant difference between the postoperative main symptom of the operative group and the main symptom of the conservative group (p = 0.3).
The mean collateral symptom score in the operative group worsened from 76 ± 25 preoperatively to 60 ± 25 postoperatively (p < 0.0001).The preoperative collateral symptom score in the operative group was significantly worse than that in the conservative group, 92 ± 12 (p = 0.005).

Effect size
The effect size measured by Cohen's d was 1.09, indicating a large effect size, for the main symptom for comparison of the preoperative and the postoperative score (Table 7).In the same comparison, the effect size of the collateral symptom was 0.65 (moderate), and that of the PCS was 1.26 (large).
In a comparison of operative and conservative groups, the effect size was 0.77 for the main symptom and 0.67 for the collateral symptom, indicating a moderate effect size.

Correlation coefficient
The Spearman's correlation coefficient between satisfaction and the amount of score change was 0.48 (p < 0.001) for the main symptom and 0.38 for the PCS, both showing a moderate correlation (Table 8).The correlation coefficient between the main symptom and the PCS was 0.43, indicating a moderate correlation (p = 0.002).

Ceiling and floor effects
The main symptoms had no floor or ceiling effect either preoperatively or postoperatively (Figs. 4, 5).Conversely, the collateral symptom had a ceiling effect preoperatively, but no floor effect postoperatively (Figs. 6, 7).www.nature.com/scientificreports/

External validation
We added a new sample of 30 surgical patients with ASD in another facility for a disease-specific scale for ASD that we had created.This scale consisted of 10 main symptom and 5 collateral symptom questions, as described above.Total scores were calculated using the above formulas (Supplementary File 1).The SF-8 and satisfaction scale were enclosed, as well as the date when the scale was created.www.nature.com/scientificreports/Twenty-five people responded (Table 9).There was a significant difference in the age and fixation range between 25 patients for external validation and 89 patients for internal validation.However, no other background information was significantly different.The main symptom improved from 56 ± 19 preoperatively to 76 ± 19 postoperatively with an effect size of 1.05.The collateral symptom worsened from 75 ± 23 preoperatively to 64 ± 24 postoperatively with an effect size of 0.48.In both domains, the effect size was not different from the effect size at the time of scale creation, indicating the robustness of the scale.AIS, which is less functionally impaired and, therefore, is less relevant for ASD, which seeks to restore pain and quality of life.Faraj et al. stated that there was an overlap between the two outcomes and the need to develop a core outcome set that is more specific to the assessment of ASD.
Mannion et al. performed a factor analysis of the SRS-22 on ASD patients 34 .They found a poor fit for four questions on the SRS-22: Q3 (nervous person), Q14 (personal relationship), Q15 (financial difficulties), and Q17 (sick days).They recommended the deletion of these four questions.
Zaina et al. compared the newly developed Italian spine youth quality of life (ISYQOL) with the SRS-22 using Rasch analysis 35 .According to this group, Q15 (financial difficulties) in the SRS-22 was a poor fit, and they recommended 21 items except for that one.By excluding this item, the revised SRS-22 showed construct validity comparable with the ISYQOL.
Scheer et al. devised a patient generated index, a questionnaire that patients were asked to fill out freely 36 .The top 10 concerns of patients with ASD were walking, activities, posture, pain, sports, housework, relationships, gardening, sleeping, and traveling.The 29 items we selected almost covered these items.Of these items, about sports, some patients in this study indicated in their free-text sections that they did not engage in these activities.The term "sports" covers an extensive range, from light gymnastics and walking to running and swimming.We did not select Q26 sports because the factor loading was small and also because different people perceived this item differently.
Housework activities, conversely, are important for patients with ASD.In particular, as ASD is more common in older women, it is essential to include kitchen activities in the assessment.A kitchen elbow sign, for example, is a skin abnormality that develops on the elbow when working in the kitchen, as the patient must rest her elbow on a table to maintain a standing position 36 .In the current study, the factor loadings for washing dishes and laundry were large.Kitchen elbow sign is especially likely to occur when washing dishes because both hands are used, and the patient cannot hold a cane or walker during the task.Large factor loading of these two items suggests that patients with ASD have kyphosis, making it difficult for them to maintain an intermediate or dorsiflexed position.
Restriction of lumbar spine mobility after long fusion is a concern for both surgeons and patients 6 .Ishikawa et al. conducted a study about ADL for 36 long fusion patients 13 .They found that patients after long fusion performed better than preoperatively in activities such as sleeping supine, standing upright, vacuuming, doing laundry, and reaching for objects placed at heights.Conversely, strenuous activities such as shoveling snow worsened postoperatively.Overall surgical satisfaction was 70%.Their report suggests that long fusion surgery for ASD requires evaluating both positive and negative aspects.
Hart et al. investigated functional limitations due to lumbar stiffness in 62 patients 5 .They reported that 91% of the patients were satisfied with the trade-off between postoperative improvement in back pain and associated restriction of motion.In the present study, 73% of the patients were satisfied with their surgery.Their study included 24 cases (40%) of one vertebral fusion and only 19 cases (31%) of five or more vertebral fusion.Our patients had five or more intervertebral fusions, with an average of 10 fused vertebrae.This difference in fixation levels may have influenced the difference in satisfaction.
One of the advantages of our scoring system was that factor analysis divided the questions into two domains.The effect of surgery on ASD resulted in improved ADLs associated with improved pain and posture, but also movement limitations.Simply adding up these improvements and any worsening could result in a total score of plus or minus 0. By dividing this score into two domains, we could assess each symptom with each domain having the appropriate responsiveness.This represents two aspects of surgery for ASD and is a necessary component for improving treatment efficacy and explaining surgery to patients.
Another strength of this study was that the subject patients had an average of 10 long fixed vertebral intervertebral spaces, and 78% underwent fusion from the pelvic to the thoracic spine.Previous studies have focused on short lumbar intervertebral fusion procedures.Our patients are a more suitable population to assess ADLs for long fusion, especially as including L5/S in the fusion range would result in greater limitation.
There were some limitations in this study.The number of the patients was limited.Factor analysis was performed on 98 patients, slightly less than 100 patients.However, considering the two factors that were found, this could be considered sufficient.Because this study was conducted in one country, the results may not be generalizable to other countries.The burden of housework activities may differ between developed and developing countries.Reliability was assessed by content consistency, and a test-retest was not conducted in this study.The preoperative score was based on memory and there may have been recall bias.Patients with a longer follow-up period become more accustomed to their current symptoms and may underestimate the difference between their preoperative and current conditions.These issues should be addressed in future studies.

Conclusion
We developed a disease-specific outcome for ASD using factor analysis.This analysis is the first scientifically validated measure that could simultaneously assess the benefits and limitations of ASD surgery.This tool can complement existing outcomes and will be useful for explaining surgery to patients and for future clinical trials.

Figure 1 .
Figure 1.Schematic of changes in a typical long fusion surgery.Preoperatively, the patient cannot maintain posture due to kyphotic deformity.Postoperatively, the patient can maintain posture, but has limited range of motion.

Figure 3 .
Figure 3. Flowchart of question item selection.

Figure 4 .
Figure 4. Histogram of the preoperative scores of the main symptom.The main symptom has no floor or ceiling effect.

Figure 5 .
Figure 5. Histogram of the postoperative scores of the main symptom.The main symptom has no floor or ceiling effect.

Figure 6 .
Figure 6.Histogram of the preoperative scores of the collateral symptom.The collateral symptom has no floor or ceiling effect.

Figure 7 .
Figure 7. Histogram of the postoperative scores of the collateral symptom.The collateral symptom has no floor or ceiling effect.

Table 1 .
Review list of the questionnaires.

Table 2 .
Twenty-nine items for factor analysis selected after discussion among the surgeons.

Table 3 .
Demographics of the study patients.SD standard deviation.

Table 4 .
Mean and standard deviation of raw data for each item.Raw data means the score points of the answer options.For raw data, higher numbers indicate more activity restrictions.SD standard deviation.

Table 5 .
Spearman correlation coefficients between each item for postoperative answers.

Table 6 .
Factor loadings and factor score coefficients.

Table 7 .
Comparison of the final version scores between operative cases and conservative cases, and between preoperative condition and postoperative condition.Scores are shown as mean ± SD.N.A. not available, SD standard deviation.

Table 8 .
Spearman's correlation coefficients between change scores and 5-point satisfaction rating scale.